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NEURAL NETWORK METHODS TO PREDICT ENZYME INHIBITOR OR 

RECEPTOR LIGAND POTENCY 



This invention was made with United States government support under Grant NIH 
5 GM4 1916 from the NIH. The United States Government may have certain rights in this 
invention. 

BACKGROUND OF THE INVENTION 

10 The present invention relates to a method for calculating the binding free energy for 

interactions between biomolecules. More particularly, the present invention relates to a 
method that employs computational neural networks to discover quantum mechanical 
features of enzyme active site transition states, as well as quantum mechanical features 
required for binding of putative enzyme inhibitors. The present method is also applicable 

15 to discovering quantum mechanical features required for the binding of a potential ligand 
to a biological receptor. Computer-readable media may be incorporated with information 
enabling the method of the present invention to be performed on a general-purpose 
computer. 

Enzymatically catalyzed reactions are characterized by geometric and electrostatic 
20 distortions of a substrate molecule into a transition state. The formation and stabilization 
of these transition states by enzymes are accompanied by increases in the rate of catalysis 
on the order of 10'°-10 IS times faster than the uncatalyzed reaction. It is thought that an 
enzyme binds the transition state of a substrate molecule more tightly than either the 
substrate or the product. As a result, chemically stable molecules that mimic the substrate 
25 transition state should be potent inhibitors of enzyme activity. 

The de novo design of transition state inhibitors requires accurate models of the 
enzyme-stabilized transition state. Advances in theory and computational chemistry have 
produced good models of stable molecules and enzymatic transition states from kinetic 
isotope effect experiments. However, the development of computational and theoretical 
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methods of prediction of ,he binding cons,an, of a putative iohibi.o,. prior „ synIhesis 
chemical hbranes for transition state mimics. 

.^"'-^^ofthedevelopmentofmethodstopredictthebindingof 
bto.og.cai agents to enzymes or receptors. 

categor.es. The firs, is the use of docking or moiecular dynamics studies t0 investigate the 
tmeracnons of snbsna.es wi.h a variety of bioiogicai molecules, snch as enzymes or 
recepror site, The second „ the use of ^ 

(heremafier QSARs"), which usually investigate .he properties of po.en.ia, ,herapen.io 

agen.s,nu,eabsenceof,heirbio,ogical I arge, E achofm«ememod S haveadva„«a 8 esand 
disadvantages. 

Tire concept of docking or molecular dynamics studies is te rf 

Physical iawsofmo.ionorstafioin.erac.ioneanbeapp.ied.obio.ogicalsys.ems.opredic, 
•he strength of interaction of a substrate with a complex biological molecule. In genera, 
btologica. macromolecu.es and their substrates font, a system ,00 ,arge for 00 Mll o 
quantum chemica. methods to be used .0 generate e.ecu-onic po«en,ial energies, and so 
paramettzedclassicalforcefieldsa^m^ 

can be run. This is a massive technology with an equally huge literarure. A variety of 
aigonthms have been developed with allow efficient integration of Newton's equations 
In edition. Monte Carlo methods are of critical importance in this fie.d. There have also 
been recent advances in mixing quantum and classical mechanics, such as .he surface 
hopping methods in chemical physics. Because even this calculation is challenging for 
complex systems, static memoes, which troa, parts ofa system as dielecmc continue, have 
been employed. Ttese approaches have been employed in docking studies, in which 
subswes are virtually oriented and bound to an active site of an enzyme or other 
biomolecule. 

As important as these approaches are, there are still difficulties in the application 
of these approaches to drug design and analysis. First, there are a great many 
approximations inherent in the development offeree field and dielectric continua models. 
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Though ,hese methods have been smdied for many years, it is stiII difflcult t0 ^ which 
approx.ma.ions are appropriate. Second, these calculations are difficui, ,„ p^ eve „ 
when the approximations^ appropriate* employed, and require signify comparer 
resource, As a result, me use of such docking studies in searches of libraries 
5 candidate dnrgs for interactions with a particuiar urge, enzyme is impractical. Third a 
stntcmre for the biomolecule of interest is required for these docking studies. ,f for 
example, only a DNA sequence is available, mese methods cannot be empioyed 

TTre second method for predicting biological activity, QSARs, focuses on the 

^'--lecule.Theconcep.assumesmatthereisadatabaseofexperimen^.evidence 
from whtch inference can be drawn as ,o the effectiveness of orher molecule, Specific 
properties of substrate molecule, such as hydrophobic!*, the presence of certain groups 

s.encparan.eters.e.careempiricallyfi.roexperimentaHyde.ertninedbiologicalactivities' 
Tne assumption is that once this fitting is appropriately performed, an accurate prediction 
of b,ologicaI activity of a previous* untested molecule can b* made by examining the 

IS ™'^f°r,he S ame.™perue,Th^^ 

structural parameters used gave been expanded to include quantum mechanical feannes 
of substrates such as electrostatic potential surface, For example, point-by-poin, 
companson of quantum mechanical electrostatic potentials on molecular van der Waals 
surfaces has b«n used to predict inhibition constants for transition sate inhibitors for the 

20 teactioncaalyzedbyAMPdeaminaseandAMPnucleosidase. While these approaches are 
powerful, there are a number of difficulties. Fir* one must be able to identify a specific 
feature that determines bioactivity. When multiple features are involved, it is difficult to 

determmehowmetotemctionsofmesefeaturesaffecutbioactivity.Second.a^^ 
may no, be able to identify bioactivity trends when a number of different mechanisms are 
25 present (as where, for example, an enzyme protonaies some substrates but no, others). 
Finifly, in addition ,o me practical prediction of bioactivity among libraries of polentially 

bioactivecompound,itisdestable,odevelopatheoreticalapproachtha,canhel P ideniify 
the features of tite candidate molecules tha, are important, and thereby help ,o elucidate 
unknown mechanisms in tile bioactivity. 



34276.1 



4 



The present invention is based on prior work on molecular simi.ariry measures 

wh.chcomparedectros.ticpotentialsurfacesonthevanderWaa.asurfaceoftwodifferen, 
molecules. Two different molecule, having simiIar ^ _ 

found ,„ have similar binding proxies. Therefore, strong electrostatic similarity ,„ an 
expenmentaliy determined virion state result in strong binding, and a powerful 
transmon state inhibitor. The similarity measure is defined as follows- 



nA nB 
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wherem is an electrostatic potential on molecule A at position i, is an electrostatic 
10 potential on molecule B at position i, r, is a distance between points i and j on a surface 
and a ,s a decay constant employed so that points that are very fax apart do not strongly 
affect mesimilaritymeasure.Thesimilaritymeasures were first applied to transition state 
inhibitors for the reactions catalyzed by AMP deaminase, adenosine deaminase, and AMP 
nucleosidase. Transition state structures for each enzyme were obtained by kinetic isotope 
15 experiments (Kline et ah, J. Biol Chem. 269:22385-22390 (1994); Ehrlich et al., Biochem 
33:8890-8896 (1994)). Electrostatic potentials were then calculated for the transition 
state* the substrates, and the putative inhibitors. These were obtained using the 
GAUSSIAN 94 quantum chemistry package (from Gaussian, Inc., Pittsburgh, PA). 
Minimal basis sets (ST03G) were used m me initial smdies, and were mereafte^ 
20 using higher order basis sets. The molecules were oriented with respect to each other to 
maximize geometric overlap, and the electrostatic similarities were calculated. 
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P ,d J rof ; 7 reac,ions ' the ca,cu,a,ed numericai *** - ■ »— «. 

energy versus electros-atic similarity (Se) for me AMp 

slate. Sumlarly, F,g. 2 shows aplo, of experi m e„,a„v de t erm in e d binding ftec energy 
d^ydropurme ribonucleoside is a signifies ouriie, bur rhe Gaining resuits ae quite 

10 DeS P i,e,hes «" co ^ingini« i al reS ul tS , i « S oo„becamea P pare„,,ha,thes imil ariry 
measure does no. accurately predie, all cases of horror enzyme binding. The reason is 
.ha. a rrears aH points on the van der Waals surface e q uiva,en. 1 y. , t „ ^ quite 
possrhle ,o have a perfect configuration for binding in the region of an inhibit molecule: 

• te ~-<htheac. i ves ite ,bu t havesign i fican.d iff ere„cesremo t . ft o m[ hi ssite . As 
a result, the similariry measure would produce a result that would predict weaker binding 

.hat.^eactua.bindingfteeenergy.Similariy.amo.eeuIetainitiai.yiooksverydifferen, 
from a transition state inhibitor could be changed electrostatically by its interaction with 
<he enzyme, by e.g., pronation, to a form that might have a high binding ftee energy 
Agam. the similarity measure would predict a weaker binding than what would be 
20 measured experimentally. Furmetmore, there are many reason algebraic similarity 
measures tha, may be applied in each case, and choosing the most appropriate measure 
would require extensive computational resources. 

An artificial neural network is a computer algorithm which, during a training 
process, can leam features of input patterns and associate Ihese with an output After the 
2S leanrinj phase is completed, the trained network enables the computer to predict an output 
for a pattern not included in the training process. Neural networks have been used in a 
small number of cases to study biological activity prediction. For example, Kohonen self- 
organizing maps have been used to transfotm the three^iimensional surface of 
biomolecules to a two-dimensional projection (Gasteiger et al., J. Am. Chem. Soc. 
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H6: 608 4620 (.994,,. Similarly, the mo , ecular „ ^ ^ 

Waa 1 ss urf aceh as bee„coUap S cdo„,oa S en e so f I2au l oco TCl a ti o„c„ e « i c i e„, s ,a n d t hes e 
were used in a neural network (Wagener et al., J. Am. Chen,. Soc. .177769-7775 
(1995)). In both these cases, potentially „ sefa | daa were djscar(Jed ^ ^ ^ 
dtmensiona, surface information waa converted to a two-dimensiona, represematio „ 
Neura, networks have also been used ,o predict the mode of action of chemotherapeutic 
agents (Weinstein et al., Stem Cells ,2: ,3-22 (.994),. Finally, neural networks have been 

usedtopredictbio.ogicalactivi^ftomdiscreteQSARdescriptionsofmolecu.ars.racmre 
(So and Richards, J. Med. Chem. 35:320.-3207 (.992,). However, this approach fads 
10 if the correct QSAR is not selected. 

It is therefore desirable to have a method that can accurate., predict binding free 
energy for a wide variety of potential inhibitors. „ is also desirable to have a method for 
determination of binding free energy tha, identifies those regions of a potential inhibitor 
or outer bioactiv. molecule that are especially important in binding, and thereby help 
IS elucidate tmknown binding features. Furthermore, it is desirable to have a method for 
determining binding ftee energy that would adjust itself in each case to the Corn, most 
suited to that particular case. 
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SUMMARY AND OBJECTS OF THE INVENTION 

Accordingly, it is an object of the present invention to overcome the limitations of 
the prior art. 

It is another object of the present to provide a method for determining the free 
energy of binding of a substrate of known structure to an enzyme. 

. r It is another object of the present invention to provide a method for determining the 
free energy of binding of an inhibitor of known structure to an enzyme. 

It is another object of the present invention to provide a method for determining the 
free energy of binding of a ligand to a receptor. 
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Il isa„o l h CT „b je c t „f tilepresenti „ veruion , oprov . deaco 
encoded with information that enables a general n„™„ , 

of -he presen, invention. ^ '° Perf ° m «— - 

Brien y sti«ed,a„ew m e t hod t oa„a, y2 ea„ d p redicttheb ,. nding energvforenzyme . 

emp oved to d,scovery ouantum m e chanical feamres of ^ ^ 

— T«- "inding. The method 
between the quantum mechaniea, stntcture of the inhibitor and tine strength of binding 
Feed-forward neura, networks with back propagation of errorcan be tr ained to recognize 
e quantum mechaniea, e.ectrostatic potonria, a. ,he entire van der Waais surface, rather 

of mteractions berween.be enzv^e and a group of „„ve, inhibitor, 
resuhs show that rhe neura, networks can predict witi, quantitative accuracy the binding: 
strengti, of new inhibitors. The method is in f act able t0 predjct ^ ^ ^ 
IS energy of the transition state, when trained witi, less tightly bound inhibitor, The presen, 
method is also appUcabl. to prediction of the binding freeenergy of a ligand to a receptor 

permi, evaluation of chemical libraries of potential inhibitory, agonistic, or antagonistic 
agent, The method is amenabie to incorporation in a computer-readable medium 
20 accessible by general-purpose computers. 

According.oanembodimen.ofmepresentmventio^amemodfordetermimngme 
fee energy of binding of a potential ligand to a receptor comprises the steps of obtaining 

foreachoftwoormoreactoa, receptor Hgands.atleastoneofastmctuxeandaftee energy 
of binding to tire receptor, such ma, each of the two or more actual receptor Uganda has a 
25 ^rthwh<^^» tlBI|lM ^ tl|(]Biiii 
of thetwo or more actual receptor Uganda for maximum geometric coincidence witi, each 
other, defining an electrostatic potential at each of more man one point on a van der 
Waals surface of each of the actual receptor Uganda, thereafter, mapping each of the 
electrostatic potentials of each of the actual receptor Uganda onto a geometric surface of 
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bemg * - -f- geometty but . dif J em 

elecrostat.c po.en.ia, surface, and each of ,he eiec.ros.atic po.en.iaU being described by 
posmonal information re,a,ing ,he elec.ros.atic potentials ,„ ft. geometric surface 
.hereanerJ„p uning ft eelK ^^ 

free energy of binding of onc of ^ „ _ ^ ^ ^ ^ ^ 
netwotk, .hereafter, .raining me neura. network until lh e neura. network predics .he See 
energyofbindingof.be one of,he«wo or more actoal receptor Uganda, repea,i„ gth e steps 
of .npuning and training for each of the remaining , he M0 or more actua , 
to produce a .rained network, .hereafter, determining a po.en.iai Hgand elec.ros.a.ic 

po.en.,a.a.eachof ra or= m anonepoin,o„avanderWaa 1 s S urfaceofme P „ t en,ia ll igand 
thepo,e„,iaiHga„dhavingaknowns^eandar,u^ow„ freeenergyofbindingtothe ' 

receptor.onen.ingmes.rucmreofthepo.entialligandformaximumgeomeWccoincidence 
wnhthestntcturesofhe two or more actual receptor Uganda, thereafter, mapping e,-hof 

15 ^^smicpo.en.ialsof«hepo,en,ialUgandon.oa g eomemcsurfaceofoneofthe,wo 
ormore actual receptor Uganda, the potential ligand tavtag a surface geometty identical «o 
<ha. of me mo or more actual receptor Uganda, b „, a different electee po.en.ia. 
surface, and each of me electees*,* potentials of me po«en,ial ligand heing described by 
pos.tiona. information relating me electeos«a„c po.en.iab ,o me geometiic surface 

20 .hereafter, inputting me electrostatic potentials and me positional information of me 

electtosmticpo.entialsofmepo.entimligandin.omenainednenvork.andusmgmenained 
nerwork to calculate a free energy of hinding of the potential ligand to .he receptor. 

According to another embodiment of tile present invention, a method for 

determining the free energy of binding of a potential ligand to a receptor comprises the 

25 steps.qf obtaining a structure for the potential ligand, orienting structures of two or more 

acntaTreceptor ligands for the receptor for maximum geometric coincidence with each 

other, each of the two or more actual receptor Uganda having a known sttucture and a 

known free energy of binding to the receptor, determining an electroaatic potential at each 

of more titan one point on a van der Waals surface of each of lhe actual receptor ligands, 
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^ereafter.mappingeachoftheelectrostaticpotentialsofeachoftheactual receptor ligands 
ontoageometncsurfaceofoneofthetwoormoreactualreceptorligand^^ 
or more actual receptor ligands being thereby described by an identical surface geometry 
butad.fferentelectrostaticpotential surface, and each of the electrostatic potentials being 
descnbed by positional information relating the electrostatic potentials to the geometnc 
surface, Rafter, inputting me electrostatic potentials, me posi^ 
known free energy of binding of one of the two or more actual receptor ligands into a 
neuralnetwork,mereafter,trainingtheneuralnetworkuntilmeneuraln^ 
free energy of binding of the one of the two or more actual receptor ligands, repeating the 
10 steps of inputting and training for each of the remaining the two or more actual receptor 
ligands to produce a trained network, thereafter, determining an potential ligand 
electrostatic potential at each of more than one point on a van der Waals surface of the 
potential ligand, the potential ligand having an unknown free energy of binding to th. 

receptor.orientingmestructureofmepotentialligandformaximumgeome^ 
15 wtththestnicturesofthet^^^ 

theelecttostaticpotentialsofmepotentialligandontoageometricsurfaceofoneofthetwo 
or more actual receptor ligands, the potential ligand having a surface geometry identical to 
that of the two or more actual receptor ligands, but a different electrostatic potential 
surface, and each of the electrostatic potentials of the potential ligand being described by 
20 positional informant relating the electrostatic potentials to the geometric surface, 
thereafter, inputting the electrostatic potentials and the positional information of the 
electrostaticpotentialsofmep^^^ 

network to calculate a free energy of binding of the potential ligand to the receptor. 

According to another embodiment of the present invention, a computer readable 
25 medhj n comprises computer-readable information, the information capable of interacting 
with a computer to produce an output, the output being a calculated free energy of binding 
of a potential ligand to a receptor, the output being calculated by orienting structures of the 
two or more actual receptor ligands for maximum geometric coincidence with each other, 
each of the two or more actual receptor ligands having a known structure and a known free 
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energy of binding to the receptor, determining an electrostatic potential at each of _ c 
thanonepointonavanderWaalssurfaceofeachofthe actual receptor ligands, thereafter 
mapping each of the electrostatic potentials of each of the actual receptor ligands onto a 
geometric surface of one of the two or more actual receptor ligands, each of the two or 
5 more actual receptor ligands being thereby described by an identical surface geometry but 
a Afferent electrostatic potential surface, and each of the electrostatic potentials being 
descnbed by positional information relating the electrostatic potentials to the geometric 
surface, thereafter, inputting the electrostatic potentials, the positional information, and the 
known free energy of binding of one of the two or more actual receptor ligands into a 
10 neural network, thereafter, training the neural network until the neural network predicts the 
free energy of binding of the one of the two or more actual receptor ligands, repeating the 
steps of inputting and training for each of the remaining the two or more actual receptor 
ligands to produce a trained network, thereafter, determining an potential ligand 
electrostatic potential at each of more than one point on a van der Waals surface of the 
1 5 potential ligand, the potential ligand having a known structure and an unknown free energy 
of binding to the receptor, orienting the structure of the potential ligand for maximum 
geometric coincidence with the structures of the two or more actual receptor ligands, 
thereafter, mapping each of the electrostatic potentials of the potential ligand onto a 
geometric surface of one of the two or more actual receptor ligands, the potential ligand 
20 having a surface geometry identical to that of the two or more actual receptor ligands, but 
a different electrostatic potential surface, and each of the electrostatic potentials of the 
potential ligand being described by positional information relating the electrostatic 
potentials to the geometric surface, thereafter, inputting the electrostatic potentials and the 
positional information of the electrostatic potentials of the potential ligand into the trained 
25 network, and using the trained network to calculate a free energy of binding of the potential 
ligand to the receptor. 

According to another embodiment of the present invention, a method for 
determining a free energy of binding of a potential transition.state inhibitor to an enzyme 
comprises the steps of obtaining, for each of two or more enzyme substrates or inhibitors, 
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atleastoneofastr^^ 
twoormoreenzymesub^^ 

mhbttors for maximum geometric coincidence with each other, determining an 

5 e,e ^staticpotentiaIateachofmoremanonepointonava,derWaalssurfaceofeachof 
theer^mesubstratesorinmbitorMhereafter.mappingeachofmeelec 
ofeachofthe enzyme substrates or inhibitors onto a geometric surface of a transition state 
.nhtb.tor, each of the enzyme substrates or inhibitors being thereby described by an 
identical surface geometry but a different electrostatic potential surface, and each of the 
10 electrostatic potentials being described by positional information relating the electrostatic 
potentials to the geometric surface of the transition state inhibitor, thereafter, inputting the 
electrostatic potentials, the positional information, and the known free energy of binding 
of one of the two or more enzyme substrates or inhibitors into a neural network, thereafter , 
training the neural network until the neural network predicts the free energy of binding of 
15 the one of the two or more enzyme substrates or inhibitors, repeating the steps of inputting 
and training for each of the remaining the two or more enzyme substrates or inhibitors to 
produce a trained network, thereafter, determining an potential transition electrostatic 
potential at each of more than one point on a van der Waals surface of the potential 
transition-state inhibitor, the potential transition-state inhibitor having a known structure 
20 and an unknown free energy of binding to the enzyme, orienting the structure of the 
potential transition-state inhibitor for maximum geometric coincidence with the structures 
of the two or more enzyme substrates or inhibitors, thereafter, mapping each of the 
electrostatic potentials of the potential transition-state inhibitor onto a geometric surface 
of one of the two or more two or more enzyme substrates or inhibitors, such that the 
25 potential transition-state inhibitor has a surface geometry identical to that of the two or 
more- actual receptor transition-state inhibitors, but a different electrostatic potential 
surface, and each of the electrostatic potentials of the potential transition-state inhibitor is 
described by positional information relating the electrostatic potentials to the geometric 
surface of the two or more enzyme substrates or inhibitors, thereafter, inputting the 
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electrostatic potentials and the positional information of the electrostatic potentials of the 
potential transition-state inhibitor into the trained network, and using the trained network 
to calculate a free energy of binding of the potential transition-state inhibitor to the 
enzyme. 

5 According to another embodiment of the present invention, a method for 

determining the free energy of binding of a potential transition-state inhibitor to a enzyme 
comprises the steps of obtaining a structure for the potential transition-state inhibitor, 
orienting structures of two or more enzyme substrates or inhibitors for the enzyme for 
maximum geometric coincidence with each other, each of the two or more enzyme 
10 substrates or inhibitors having a known structure and a known free energy of binding to the 
enzyme, determining an electrostatic potential at each of more than one point on a van der 
Waals surface of each of the enzyme substrates or inhibitors, thereafter, mapping each of 
the electrostatic potentials of each of the enzyme substrates or inhibitors onto a geometric 
surface of one of the two or more enzyme substrates or inhibitors, each of the two or more 

15 enzyme substrates or inhibitors being thereby described by an identical surface geometry 
but a different electrostatic potential surface, and each of the electrostatic potentials being 
described by positional information relating the electrostatic potentials to the geometric 
surface, thereafter, inputting the electrostatic potentials, the positional information, and the 
known free energy of binding of one of the two or more enzyme substrates or inhibitors 

20 into a neural network, thereafter, training the neural network until the neural network 
predicts the free energy of binding of the one of the two or more enzyme substrates or 
inhibitors, repeating the steps of inputting and training for each of the remaining the two 
or more enzyme substrates or inhibitors to produce a trained network, thereafter, 
determining an potential transition-state inhibitor electrostatic potential at each of more 

25 than^one point on a van der Waals surface of the potential transition-state inhibitor, the 
potential transition-state inhibitorhaving an unknown free energy of binding to the enzyme, 
orienting the structure of the potential transition-state inhibitor for maximum geometric 
coincidence with the structures of the two or more enzyme substrates or inhibitors, 
thereafter, mapping each of the electrostatic potentials of the potential transition-state 
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inhibitor onto a geometric surface of one of the two or more enzyme substrates or 
inhibitors, the potential transition-state inhibitor having a surface geometry identical to that 
of the two or more enzyme substrates or inhibitors, but a different electrostatic potential 
surface, and each of the electrostatic potentials of the potential transition-state inhibitor 
5 being described by positional information relating the electrostatic potentials to the 
geometric surface, thereafter, inputting the electrostatic potentials and the positional 
information of the electrostatic potentials of the potential transition-state inhibitor into the 
trained network, and using the trained network to calculate a free energy of binding of the 
potential transition-state inhibitor to the enzyme. 
10 According to another embodiment of the present invention, a computer readable 

medium comprises computer-readable information, the information capable of interacting 
with a computer to produce an output, the output being a calculated free energy of binding 
of a potential transition-state inhibitor to a enzyme, the output being calculated by orienting, 
structures of the two or more actual receptor ligands for maximum geometric coincidence 
1 5 with each other, each of the two or more actual ligands having a known structure and a 
known free energy of binding to the enzyme, determining an electrostatic potential at each 
of more than one point on a van der Waals surface of each of the enzyme substrates or 
inhibitors, thereafter, mapping each of the electrostatic potentials of each of the enzyme 
substrates or inhibitors onto a geometric surface of one of the two or more enzyme 
20 substrates or inhibitors, each of the two or more enzyme substrates or inhibitors being 
thereby described by an identical surface geometry but a different electrostatic potential 
surface, and each of the electrostatic potentials being described by positional information 
relating the electrostatic potentials to the geometric surface, thereafter, inputting the 
electrostatic potentials, the positional information, and the known free energy of binding 
25 of on»of the two or more enzyme substrates or inhibitors into a neural network, thereafter, 
training the neural network until the neural network predicts the free energy of binding of 
the one of the two or more enzyme substrates or inhibitors, repeating the steps of inputting 
and training for each of the remaining the two or more enzyme substrates or inhibitors to 
produce a trained network, thereafter, determining an potential transition-state inhibitor 
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electrostatic potentia, a. each of more than one point on a van der Waals surface of the 
potentta, receptor Ugand, th. potentia, receptor ,igand having a known structure and an 
unknown free energy of binding to the enzyme, orienting the structure of the potentia, 

• f -»'°-'-inhibi t orfor m axi mum geome U iccoinc i dencewi,h,hesn.cn J re S ofU,erwo 
or more enzynte substrates or inhibitors, thereafter, mapping each of the e,ec,ros,a,ic 
potennais of the potentia, tmnsition-state inhibitor onto a geometric surface of one of the 
two or more enzyme substrates or inhibitors, the potentia, transition-state inhibitor having 

asurfacegeomeuyidenticaaothatofthetwoormoreenzymesubs^esori^ibitors.hu. 
a dtfferen, ecstatic potentia, surface, and each of the electrostatic potentials of the 
10 potentia, transition-state inhibitor being described by positional infection re,a,i„ g the 
eiectrostatic p„,en,ia,s to the geometric surface, thereafter, inputting the Cectrostatic 
potentials attd the positiona. infection of the electrostatic po«en„a,s of the potential 

a free energy of binding of me potential transition-state inhibitor to the enzyme 
15 Addhi ° n ^ d ™Wthe P resen^^^^ 
:oIIows. 



which follows. 
BRIEF DESCRIPTION OF THE DRAWINGS 



20 



Figure 1 shows a plot of the binding free energy (AG/RT) for AMP nucleosidase 
versus similarity measure (&) according to the prior art 

FigureZshowsaplotofthe binding free energy (JG//?r) for adenosine deaminase 
versus similarity measure (Se) according to the prior art.' 

Figure 3 shows a schematic diagram of the neural network employed in the method 
25 of the present invention. 

Figure 4 shows the structures of the methyl derivatives of the molecules used in the 
cytidine deaminase experiments according to the present invention. 

Figure 5 shows the structures of the potential inhibitors used in the NOS 
experiments according to the present invention. 
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Figure 6 shows the structures of the known inhibitors of lU-hydrolase used to train 
the neural network according to the present invention. 

Figure 7 shows the structures of the potential inhibitors of lU-hydrolase whose 
binding constants were determined according to the present invention 

5 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

To overcome the above-described limitations, the present inventors have tested the 
ability of computational neural networks to predict binding free energy. Neural networks 
10 are employed to investigate the properties of substrate and inhibitor molecules. Through 
training algorithms that are described below, the network can find properties and areas of 
molecules that arenecessary for biological recogmtion, binding and action. The molecules 
are described using general quantum mechanical descriptions of the substrate and inhibitor 
molecules, and so the algorithm in a real sense "chooses it own QSAR". Because the 
15 ™thodfbcusesonsubst^ 

for application of standard quantum chemistry molecular orbital calculations, there is no 
ambiguity as to choice of force fields, dielectric constants, or classical versus quantum 
mechanics. In addition, complex problems associated with the binding of water and 
counter ions in active sites are also avoided. These constructions have been shown to be 
20 able to satisfy all the requirements for flexibility described above. That is, they are able to 
discern from input data the specific data crucial for forming the proper conclusion; and 
from the data are able to extrapolate to a relationship between input and output. 

In the present work, the electrostatic potential at the van der Waals surface of a 
molecule is used as the physicochemical descriptor. The entire surface for each molecule, 
25 represented by adiscrete collection of points, serves as the inputto the neural network. To 
preserve the geometric and electrostatic integrity of the training molecules, a collapse onto 
a lower dimensional surface is avoided. After alignment of the inhibitor molecule for 
maximal geometrical overlap with the transition state structure, the electrostatic potentials 
on the inhibitor surface are mapped onto the van der Waals surface of the transition state. 



34276.1 



) 



16 

Therefore, though an inhibitor molecule takes on the geometry of the transition state the 
electrostatic potentials decorating that surface are derived from the inhibitor itself. 

The molecular electrostatic potential calculated at the van der Waals surface of the 
molecules is used as a descriptor of chemical structure and properties. Such information 
5 sheds light on the kinds of interactions a given molecule can have with the active site 
Regions with electrostatic potentials close to zero are likely to be capable of van der Waals 
interactions. Regions with a partial positive or negative charge can serve as hydrogen bond 
donor or acceptor sites. Regions with even greater positive or negative potentials may be 
involved in coulombic interactions. The electrostatic potential also conveys information 
10 '°nce m ing Aelikelih^^ 
attack. 

The electrostatic potential surfaces are quantified as follows. After a constrained 
energy minimization of a molecular structure using the GAUSSIAN 94 package 
(GAUSSIAN 94, Revision C.2, Gaussian, Inc., Pittsburgh, PA), its CUBE function is used 

15 to calculate the electron density and electrostatic potential. Since molecules described by 
quantum mechanics have a finite electron density in all space, a reasonable cutoff is 
required to define a molecular geometry. One can closely approximate the van der Waals 
surface by finding all points around a molecule where the electron density is close to 
0.002±6 electron/bohr. 6 is the acceptance tolerance since no Gaussian output will have 

20 an electron density of 0.002 exactly. The set of points thus generated will describe a 
surface under which approximately 95% of the electron density resides. 6 is adjusted so 
that 17 points per atom are accepted, creating a fairly uniform molecular surface as shown 
previously (Bagdassarian et al., J. Amer. Chem. Soc. 118:8825-8836 (1996)). The 
information about a given molecular surface is described by a matrix with dimensions of 

25 4 x ^where n is the number of points for the molecule, and the row vector of length 4 
contains the x, y, z-coordinates of a given point and the electrostatic potential there. 

For input of the surface features of the structures into a neural network the 
molecules must be oriented for maximum geometric coincidence. This can be done in 
either of two ways. In the first, the molecular stick figures are superimposed via, for 
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example, Ac algorithms provide in , he |nsjght u package (B . osym TKfaoh ^ ^ 
D.ego, CA). One seta ft. obvious atoms from two molecules that are ,o coincide 
spatially, which is a simpie mater for molecules sharing a great deai of backbone 
stmtianty. Once the superposition is achieved, the GAUSSIAN calculations ate performed 

5 -•"theNOSYMMfearuretopreservethespatia.orientationsofthemolecules.Onlvthen 
are the surfaces consulted from the electrostatic potential and electron density outputs. 

The second way to achieve properly oriented surfaces is as follows. Once the van 

derWaa.ssnrfacesofthemo.eeu,e S havebeencons«ntc.edmroughmeprocedu rc de S cribed 

0 abo «.'»°-o.ecule S a re spa,ia.lypo S i,io„edwim>hei r ge„me t riccen,ersa,«hecoordi„a«e 
ongm. One molecule is held fixed while the other is mured around its center, and for each 
new position a geometric similarity measure S, is used to gauge the degree of alignment- 



nA nB 



g 



g '=1 y=l 



nA nA I „g n g 

EI expiry Jj J cxp(.arV 



The double sum in the numerator is over all surface points on molecule^ and on molecule 
15 B. nA and nB refer to the number of surface points in molecule^ and B, respectively. 
is the spatial distance squared between point / on A and/ on B. a is the length scale that 
weighs the degree to which spatial distances between i and/ affect S r The denominator 
is a normalizing factor. S 8 is calculated for many random orientations, and the relative 
orientation with the maximum S g is saved as it corresponds to the orientation of molecule 
20 A witixrespect to B with maximal surface coincidence. Ail molecules are thus oriented to 
a reference target molecular surface - that of the transition state. 

Input patterns entering into a neural network are presented in the form of a vector 
with entries (1„1 2 ,...,1„). Since the molecules are represented by a 4 x n matrix, a method is 
needed to discard the ^-coordinates but maintain the electrostatic potentials while 



34276.1 



18 



preservmgthe.aximumamou^ 

the surface points of every mo lecu,e onto the same geometrical surface, such as for 
example, that defined by the transition state (Bagdassarian et al., Int. J. Quant Chem • 
Quant.BiolSymp.23:73-80(199^^^ 

5 network, with their differences and similarities preserved, a nearest neighbor mapping 
fbnctu>nforthesurfa^ 

must be oriented for maximum geometric coincidence, as described above, and each 
.nhabitor molecule is mapped onto the transition state. For each point on the transition state 
surface the spatially closest point on the inhibitor surface is found and the electrostatic 
10 potential of that inhibitor point is assigned the coordinates of that transition state point 
Therefore the transition state, substrate, and inhibitors are all represented by the same 
geometrical surface, that of the transition state. However, the electrostatic potentials on 
these surface points defining a particular molecule are derived by the projection of the 
electrostaticpotentialso^ 
15 with the same geometry, input vectors are created with only the electrostatic potential 
information, ignoring the positional information since it is now the same for all molecules. 
This mapping ensures that similar regions on different molecules enter the same part of the 
neural network. This mapping assumes that the shape of the transition state is matched by 
the cavity at the active site. This cavity is responsible for formation of the transition state. 
20 The value of this approach will be shown a posteriori by the results. 

Each data point in the input - that is, each discrete point chosen on the van der 
Waals surface at which the electrostatic potential is evaluated - enters the neural network 
through a discrete neuron. The network is composed of many simple neurons acting in 
parallel. The network function is determined by the interaction between these neurons. 
25 Networks "learn" by adjusting the strength of interaction between the neurons. The 
network has an input layer, a hidden layer, and an output layer. In the input layer, each 
input neuron corresponds to an input datum (in this case, a point on the van der Waals 
surface and the associated electrostatic potential or a deviation ofa geometric location from 
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a reference surface). There is complete interconnection between all neurons in adjacent 
layers, and the strength of interconnection is what is varied in training of the network 

The network used in the present invention was a feed forward network with back 

propagationoferrorthatlea^swithmomentum.Trainingofthenetworkisaccornp.ished 
5 by repeated backpropagation of error throughout the network. Each iteration involves the 
introduction of an input and an output pattern, calculation of error, and readjustment of 
internal parameters called weights and biases. A generalization of the Widrow-Hoff 
learmngrulewas used to modify interconnection weights until presentation of the network 
with an electrostatic potential input partem resulted in output of a known binding free 
10 energy. After the network is trained, it may be presented with an unknown pattern and it 
will mathematically generalize to produce an output binding free energy. The number of 
mput patterns required to train a network varies with the input data; however, as reported 
below, between 4 and 7 training inhibitors were sufficient to allow a neural network to- 
produce accurate predictions for unknown inhibitors. Few input substrates are required 
15 because each input contains hundreds of data points, and therefore a great deal of 
information. The small number of input substrates is remarkable from a mathematical 
perspective, and makes the method very practical to use. 

The basic construction of a back propagation neural network has three layers: an 
input layer, hidden layer, and an output layer (Fig. 3). The input layer is where the 
20 different input vectors are transferred. The link between the layers of the network is one 
of multiplication by a weight matrix, where every entry in the input vector is multiplied by 
a weight and sent to every hidden layer neuron, so that the hidden layer weight matrix has 
the dimensions n by m, where n is the length of the input vector and m is the number of 
hidden layer neurons. A bias is added to the hidden and output layer neurons, which scales 
25 all th^arguments before they are input into the transfer function. The hidden layer input 
hfj for neuron j is calculated, 



m 



34276.1 



20 



where is lh e outpu, from me * i„pu, neuron, ... „ , he demon, of me weigh, raatrix 
connection inpu. from neuron , wi,h hidden layer neuron,; and 6 y is the bias on me hidden 
■ayer — «y. This veeror *, is sen, .hrough a .ransfer function,/ This function is 
nonhnear and usualiy sigmoidal, .aking any value and returning a number between -. and 
I. A typical example is: 



10 



15 



The hidden layer output, is then sent to the output layer. The output layer input o> k is 
calculated for the fd h output neuron 



m 

o' k = b k + Y,h 0 jW jk 

where Wjl is me weigh, mate eiemen, connecting hidden layer neuron,' wim ou, P u, layer 
neuron *. The output layer output, *■„ is calcula,ed witi, me same transfer function given 
above. 



Referring to Figure 3, the input layer is represented by the squares at the top of the 
diagram. The weights are represented by the lines connecting the layers: w tJ is the weight 
between the neuron of the input layer and/" neuron of the hidden layer and Wjk is the 
weight between the/" neuron of the hidden layer and the ** neuron of the output layer. In 
this diagram the output layer has only one neuron because the target pattern is a single 
20 number - the free energy of binding. Only a single output neuron is needed if the target for 
each input vector is a single number. 

^Backpropagation was crated 
applied to multiple-layer networks and nonlinear differentiable transfer functions 
(Rwielhmtta\. > ParallelDisMbutedProcessmg,Vol 1, MIT Press, 1986). Inputvectors 
25 and the corresponding output vectors are used to train until the network can approximate 
a function. The strength of a back propagation neural network is its ability to form internal 
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ZTT h the me of a hiddcn layer of ~ Fm - '~e 

o^cox,,,,] wthoutputtargets[li0i ,, 0) , resPKtiveIy Aperecplrono 

5 ^^wou 1 dbe„„ab,c t osi mu ,a tet he tat i„» d . scribedbythesef<w ^ 
pars. The oniy way ,„ solve this problenl „ „ |eam ^ fte ^ rf 

. g e .h e r«oa ffe c,,hco U ,pu t ,„ to casa 1 he 1 eas tsimi , ari „ putscaTOeAcsameoulput ^ 
thls ^ 1 ™-o<-Hke t ha, req u ire d t „ fi „dU 1 eb eS , in hibi,„ r „heni t do=s„o,,ooU ik e 

. e -a,. state . „ is ^ inherent abUity of ^ ne(works (o Mive 

.ha, makes then, we,, conditioned for ,he «* of simulatillg biologfcal mo|ecular 
recognition. 
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Wereportastudyofthesete^^^ 
with their respective transition states, substrates, and inhibitors. Many different neural 
network constructions were studied, and the best neural network architecture varies with 
the enzyme system. Variations in the number of hidden layer neurons often caused the 

20 greatest change in the ability of the network to learn, and between four and twelve hidden 
layer neurons were used. Changing the number of iterations between 5 x W and 1 x 10' 
also had an affect on the ability of a network to lean, The learning rate controls the rate 
of change of the weights and biases, affecting a network's ability to converge; values 
ranging betweenO.l and 0.5 were used. A momentum term between 0.8 and 0.9 increased 

25 the probability that the network will converge at the global error minimum instead of a 
local~error minimum. 
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AMP Nucleosidase and Adenosine Deaminase 



The success of the trained neural networks in predicting AG/RT for enzyme- 
inhibitor and enzyme-transition state interactions for AMP nucleosidase and adenosine 
deaminase are shown in Table 1. The second column reports the experimentally 
determined free energies of binding for the transition states, substrates, or inhibitors shown 
in the first column. In the third column, the measure S e is used to rank similarity to the 
transition states. The fourth column gives predictions of AG/RTbased on the S, values of 
the molecules. For these calculations, the binding free energies ofthe transition states and 
ofthe substrates are defined by the experimental values. Predicted values of JGMW based 
on the Se values for the inhibitors were made by linear extrapolation between the values for 
the transition states and the substrates. 

The training procedure for the neural network for these two systems involved 
training the network with four patterns for each system. The number of hidden layer 
neurons, number of iterations, learning rate, and momentum were adjusted until the 
network output the binding energies ofthe four molecules in the training set with 98% 
accuracy. Once the network had learned the four patterns in the training set, it was used 
to predict a binding energy for the fifth molecule. These are the numbers listed in the fifth 
column of Table 1 . For enzyme systems with few members in the training sets, the number 
of hidden layers, iterations, learning rate, and momentum that give the best predictions for 
the test molecules were optimized for the four patterns in each training set. 



10 



15 



Enzyme/Molecule 


AG/RT 
(experimenta 
0 


Se 


dG/RT 
(Se) 


AG/RT 
(neural 
network) 




AMP Nucleosidase 




transition state 


-39 


1.000 


-39 (0%) 


-33 (15%) 


formycin 


-17 


0.434 


-18(6%) 


-17(0%) 


aminopyrazolo 

pyrimidine 

ribonucleotide 


-12 


0.310 


-14(17%) 


-15(25%) 


tubercidin 


-9.9 


0.298 


-13 (36%) 


-9.3 (6%) 


AMP — — 


-9.0 


0.173 


-9.0 (0%) 


-10(11%) 


Adenosine Deaminase 




transition state 


-39 


1.000 


-39 (0%) 


-Zy (26%) 


hydrated purine 
ribonucleoside 


-29 


0.765 


-27(7%) 


-29 (0%) 


(R)-coformycin 


-25 


0.604 


-19(24%) 


-16(36%) 


1 ,6-dihydropurine 
ribonucleoside 


-12 


0.677 


-23 (92%) 


-14(17%) i 


adenosine 


-10 


0.428 


-10(0%) 


-11(10%) 



20 For AMP nucleosidase, the errors in AG/RT as predicted by Se are: 0.0 for the 

transition state (by construction), 1.0 (6%) for formycin, 2.0 (17%) for aminopyrazolo 
pyrimidine ribonucleotide, 3.1 (36%) for tubercidin, and 0.0 for AMP (again, by 
construction). For the neural network predictions the following errors are found: 6.0 
(15%Uor the transition state, 0.0 for formycin, 3.0 (25%) for the aminopyrazolo 

25 pyrimidine ribonucleotide, 0.6 (6%) for tubercidin, and 1.0 (1 1%) for substrate. For 
adenosine deaminase, the errors in AG/RT as predicted by Se are: 0.0 for the transition state 
(by construction), 2.0 (7%) for hydrated purine ribonucleoside, 6.0 (24%) for (R)- 
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cofom,yci„,11.0(92%)ror 1 ,6K i ihydropuri„enbonucleoside,a„dO.Oforadenosi„ e (aga^ 
by construction). 

For AMP nucleosidase, the errors in AG/RT as predicted by the neural network are: 
6.0 (15%) for the transition state, 0.0 (0%) for formycin, 3.0 (25%) for aminopyrazolo 
5 Pyrimidineribonucleotide,0* 

deaminase, the errors in AG/RT as predicted by the neural network are: 1 0.0 (26%) for the 
transitionstate, 0.0 (0%) for hydrated purine ribonucleoside, 9.0 (36%) for (R)-coformycin, 
2.0 (17%) for 1,6-dihydropurine ribonucleoside, and 1.0 (10%) for adenosine. 

For the three binding constants predicted by Se, the average error is 20% of the 
10 experimental AG/RT for AMP nucleosidase, and 41% for adenosine deaminase. Even for 
such a small training set, the error from Se is quite large in the case of adenosine 
deaminase, and this is mainly because 1,6-dihydroribonucleoside is not a good inhibitor. 
Without it, the average error is 1 5%. 

The neural network, for the five predictions, performs with 1 1% error in AG/RT m 
15 the case of AMP nucleosidase, and 18% error in the case of adenosine deaminase. The 
neural network is poorer at predicting the transition state binding free energy for adenosine 
deaminase. Nonetheless, the average error over the five molecules in the adenosine 
deaminase series is only 18%. Both the neural network and the similarity measure have 
difficulty in predicting binding energy for (R)-coformycin, because its ring structure is 
20 sufficiently different from the other molecules of the training set. 



Cytidine Deaminase 



Cytidine deaminase catalyzes the hydrolysis of the amine group on cytidine to yield 
the products uridine and ammonia. Besides the transition state for the reaction and the 
substrate, there were 1 0 other compounds available in the literature for which binding free 
energies had been measured (Betts et al., J. Mol. Biol. 235:635-656 (1994); Frick et al., 

Biochemistry 28:9423-9430(1989);Horensteinetal.,Biochemistry32:7089-7097(1993)). 
Figure 4 shows the methyl derivatives (replacing the ribose ring) of the molecules used in 
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these experiments. Methyl derivatives were used because the ribose moiety is unchanged 
in all 1 2 molecules and so this constant factor remained the same for all potential inhibitors. 
There is no transition state structure available for this molecule, but there is a crystal 
structure available for the enzyme complexed with the transition state analog 5- 
5 flouropyrimidin-2-one ribonucleoside, and this structure was used as a starting point for a 
transition state model. The reaction mechanism was assumed to be similar to that for 
adenosine deaminase. The C4 to O (of the attacking -OH) is constrained to be 1.67 A, 
corresponding to that found in the crystal structure of the enzyme-inhibitor complex. The 
remainder of the molecule was energy minimized, using the GAUSSIAN 94 package as 
1 0 described above. In this case, because we employ the methyl derivatives of the molecules, 
conformation about flexible bonds was not a significant factor in the calculations. 

There is another significant difference in the cytidine deaminase system with its 
more diverse set of inhibitors. In particular, the halogenated inhibitors represent a new 
challenge to the approach. In addition to changing the electrostatic features at the van der 
15 Waals surface, the halogen substituted inhibitors differ in size so significantly from the 
transition state reference surface that geometric information needs to be included to derive 
the best results from the neural network approach. In addition to electrostatic information, 
the neural network was presented with a second set of data which gives the deviation of the 
surface points from a reference molecule chosen to be large enough that all other molecules 
20 were contained within its volume (in particular, the 5-bromo substituted surface). 

The success of the proposed methodology is demonstrated by the results shown in 
Table2. Referring now to Table 2, the AG/RTvalues calculated by the neural network are 
shown using 7 and 1 1 molecules to train the network. In the case of 7 molecules, an actual 
experiment was simulated by choosing five molecules to leave out of the training set. 
25 These-molecules were chosen without regard to chemical structure, but rather were chosen 
to span the range of binding free energy. Of these five, one was chosen randomly to design 
the neural network architecture (i.e., the network's adjustable parameters - number of 
hidden layer neurons, learning rate, momentum, and number of learning iterations). These 
parameters were adjusted so that the approach could accurately predict the known binding 
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fee energy of the larger molecule. No further adjustment ,o the network was made beyond 
this stage. Finally this trained network was used to predict the binding free energies of the 
remaining four unknown molecules. As Table 2 shows, the approach is able to predict the 
binding free energies, and is even able to yield reasonable results when trained with only 
7 known experimental values, and when die network is optimized to predict the binding 
free energy of a randomly chosen molecule. 



10 



15 



20 



Molecule 


ALr/RT 
(experimental) 


s. 


AG/RT 
(S t ) 


A( 

(neural 
7 


J/RT 
network) 

11 


transition state 


-36 


1.00 


-36 (0%) 




-30 (20%) 


hydrated pyrimidine-2- 
one ribonucleoside 


-27 


0 87 


-30 (1 1%) 


-26 (4%) 


-27 (0%) 


hydrated 5-fluoro- 

pyrimidine-2-one 

ribonucleoside 


-24 


0.78 


-26 (8%) 




-19 (21%) 


transition state for 5,6- 
dihydrocytidine 


-21 


0.88 


-30 (43%) 




-23 (9%) 


hydrated 5-chloro- 

pyrimidine-2-one 

ribonucleoside 


-19 


0.70 


-22(16%) 


-18(5%) 


-19 (0%) 


hydrated 5-bromo- 

P/nmiuine-z-one 

ribonucleoside 


-18 


0.68 


-21 (17%) 




-17(6%) 


3,4,5,6- 

tetrahydrouridine 


-16 


0.76 


-25 (56%) 




-I j \p/o) 


3,4-dihydrozebularine 


-10 


0.68 


-22 (120%) 


-13 (30%) 


-12 (20%) 


cytidine 


-9.9 


0.39 


-9.9 (0%) 




-8.3 (16%) 


5,6-dihydrocytidine 


-9.1 


0.28 


-5.3 (42%) 




-7.5(18%) 


uridine 


-6.0 


0.58 


-18(200%) 


-9.9 (74%) 


-6.1 (2%) 


5,6-dihydrouridine 


-5.7 


0.45 


-12(111%) 




-6.2 (9%) 



25 Nitric Oxide Synthetase 



To further test our approach, we have also investigated inhibitors for two different 
isoforms of nitric oxide synthetase (NOS). There has been an explosion of interest in 
recent years in the biological importance of NO, and its synthesis in living systems. The 
30 two isoforms studied are the brain isoform (hereinafter "bNOS") and the inducible, Ca 2 *- 
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drfferentbtomedical^^ 

has been implicated in cell death, and overproduction of iNOS has been implicated in 
crculatory shock and excess inflammation. Consequently, selective inhibition of either 
5 .soform as appropriate would have significant potential medical benefits. In addition 
potential inhibitors of NOS vary widely in chemical structure (Fig. 5). 

Four specific challenges are presented to the method of the present invention by the 
NOS system. First, the great diversity in geometry of the inhibitors forces the method to 
show if widely variable geometric and electrostatic structures can all be handled by the 
10 same neural construction. Second, the NOS reaction is known to be an extraordinarily 
complex biochemical reaction, and so provides a rigorous test of the claim that predictions 
of binding free energies can be made by examining the quantum properties of substrate or 
inhibitor molecules. Third, most of the molecules we study are linear chain molecules, 
predominantly connected by single bonds between atoms, and so have a high degree of 
15 conformational flexibility. This system provides a convincing test of the ability of the 
method to predict binding in such conformationally flexible systems when all molecules 
are held in an extended conformation. Fourth, we have data available on a relatively large 
set of inhibitors for two different isoforms which not only exhibit quantitatively different 
binding energies, but qualitatively different binding patterns (i.e., the order of binding free 
20 energies shifts from one isoform to the other). 

Referring to Figure 5, the 18 molecules studied vary widely in structure, and in 
cases where clearly only the guanidino group was present (or similarly an isothiourea), the 
central carbon atom was aligned with the guanidino carbon of the arginine analogues. As 
in the cytidine deaminase study, a set of 12 randomly chosen molecules was used in the 
25 training set, along with a molecule with a known binding energy to optimize the 
construction of the neural network. Then, the free energies of binding for the five 
remaining molecules were calculated. 

In each case, two sets of data are presented which represent two different choices 
of molecules for which predictions must be generated. In the second set molecule number 
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1 «,d,cb,o„„deriva ti ve,„as i nc.uded i n t hcpr=dicUo»se l because i ,isv eIydilrerentfroin 

^°*«™'-'«in.heWni„ g ^ M dwe fe U i ,w„ ul dbea stro „ gte s,of 1 h.aIgon to 
The resula f„ r iN0S are shown in Tab!e 3, and ,hose for bNOS are shown in Tabie 4 ' 



Table 3 



10 



15 



20 



25 



Molecule 
(rrom Fig. 5) 


AG/RT 
[ (experimental) 


AG/RT 
(prediction #1) 


AG/RT 
j (prediction #2) 


1 


| -18.44 




| -16.68 


2 


| -17.78 


| -18.08 




3 


| -17.73 






4 


| -17.03 "~ 




1 -15.22 


5 


[ -15.94 






6 


1 -13.28 


| -11.90 




7 


1 -12.94 






8 


1 -12.90 




-10.17 


Q 

y 


-12.53 






10 


-12.21 


1 -10.75 




11 1 


-11.88 ! 






12 1 


-11.68 ~] 




-11.36 


13 1 


-11.65 | 






14 


-11.04 | 


-12.84 




15 


-10.80 I 






:z 16 


-10.74 | 




-12.88 


17 


-9.42 






18 "J 


-8.11 : 


-11.05 | 




average deviation: 


1.58 


1.75 1 
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-y The data for iNOS are fairly uniform, with an average deviation of 1.58 and 1.75 
25 dimensionlessenergyunits. This level ofaccuracy is very surprising, given the complexity 
of Ae enzymatic reaction being te^^^^^ 

method to highly variable and flexible molecules with minimal information about binding. 
The data for bNOS show slightly less absolute accuracy, but because the brain isoform 
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binds inhibitors more tightly than iNOS, the relative error is only about 16%, even in the 
worst case. The range of binding energies is greater than 1 0 dimensionless energy units for 
iNOS and 1 4.5 dimensionless energy units for bNOS, so even with this highly variable and 
flexible set of inhibitor molecules, the results are accurate to within about 15% of the 
5 binding energy range. 

IU-Nucleoside Hydrolase 

The lU-nucleoside hydrolase system is involved in purine salvage by parasites from 
10 hosts. The enzyme was studied for two reasons. First, the mechanism of this enzyme is 
known to be very similar to that of the enzymatic subunit of cholera toxin. Inhibitors of 
IU-nucleoside hydrolase are not inhibitors of cholera toxin, due to the presence of a 
dinucleotide as opposed to a mononucleotide. However, the ability to predict binding 
patterns in inhibitors of the IU-nucleoside hydrolase will permit the identification of 
15 possible variants of these inhibitors for testing as cholera toxin inhibitors. Second, the 
existence of a large group of recently synthesized but uncharacterized inhibitors allowed 
a realistic test of the method (see Fig. 7). 

To train the neural network, an older set of 22 inhibitors of known binding free 
energy was used (Fig. 6). The binding constants of the molecules shown in Fig. 7 were 
20 then calculated, and the results are presented in Table 5. Accurate binding free energies 
could not be calculated for molecules which bind more weakly than Ki = 50 ^M. 
Therefore, for several molecules, the experimental analysis could only show that the 
molecules bound at an absolute value less than 9.81 dimensionless units. However, the 
remainder of the data are in good agreement with the experimental data, with the worst 
25 errorsjjeing about 1 5%. These results illustrate the utility of the method of the invention 
forldentifying strong binders deserving of further study (i.e., those having AG/RT values 
less than -14) from weak binders. 



i 
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The high level of predictive accuracy of neural networks makes it interesting to 
25 study how networks discriminate between different regions on the electrostatic potential 
surface of inhibitor molecules. The evident accuracy of the method is due to the use of the 
entire three dimensional surface of the molecule, rather than a collapsed representation. 
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The we.ghts i„ the hidden , ayer ^ wjth reg . ^ Qf .^^ eiectrosat . ^ 

: hat -'~'-Wndin g hav el ar g cabsolu 1 =v al ue S .Tl,e„ eBTOrkispreMntM ^ M 
tnpu, partem and an „„ tput pattem . To minimjze ^ ^ ^ ^ 

reg.ons «,a, change and affect the binding energy relative ,o .hose regions .hauhange and 
do no, affect binding energy. This recognition occnrs when the neura. network's weights 
are adjusted so to important regions are multiplied by largc wdghts ^ ^ 

regtons are multiplied by smaU weights. Document of this behavior is made by 
mspection of ft, absolute values of every number h ^ h . ddM kycr wisht ^ 

tramednetwork. The matrix is collapsed into a vector Kj by summing on J where j = / m 
10 and mis the number of hidden layer neurons: 

and where "f refers to the input surface points. Large values for y x represent regions 
found to be important to the neural network, and small values represent regions found to 
beummportantSinceall^^ 

15 of Afferent molecules onto the transition state surface, the common transition state 
geometry isusedto identify those region on the molecules found as most important by the 
neural net. This can be represented by, for example, coloring points on a van der Waals 
surface with large V i values one color and regions with small values another color 
Regionsonmolecularsurfaces^ 

20 Thenetworkisnotonlyabletoidentifyregionsmmetrau^^ 
it can also ignore regions that change without affecting binding. 

Theentiremolecularelectrostaticpotentialsurfacesofmeinlubito^^ 
experimentally determined transition states can be used to train neural networks to 
accurately predict binding energies of proposed inhibitor molecules. The neural network 
25 method possesses the ability to adjust a model of the system defined by a relatively small 
number of structure affinity pairs. Our calculations show the ability of the method to 
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predic, an enzyme, a«i„ ity ,„ inhibitor molecuIes ^ minimal ^ 
enzymatic active site is provided. 

^P^ivepowerextendstothetightlyboundtran^ 
» trained with less tightly bound inhibitor, Simiiar methods have been used by other 
grouosforthetaskofsim^^^^^^^ 

uses the entire electrostatic potential surfaces of the molecules as the inputs to a back 
propagation neural network. Our surface transformation has some similarity with the 
procedure of Gasteiger et al. where Kohonen self-organizing networks were used to 
<™ sfo ™^rent3^^^ 

themolecularelectrostaticpotentialsurfacesforeachmoleculeontotwelveau^^ 
coefficents. Importantly, these methods necessarily reduce the amount of information 
bemg used for prediction. Wagener et al. report an investigation of the binding affinity 
between 32 molecules and a receptor site. Because these molecules are constructed with 
-mil* steroid backbones, there is no confusion as to how to orient the molecules with 
15 respect to each other. The present method can be applied to this system as well. Tetkoet 
al. used a similar approach to that used by Wagner et al. They devised a protocol that can 
be used to describe the structural features of molecules with a small set of coefficient, 
These sets of coefficients were used as inputs to a neural network. 

Previous work with similarity measures give equal weight to all the regions of the 
20 molecular surfaces while neural networks become sensitive to certain regions and less 
sensitive to others. Enzyme-substrate binding occurs through a number of specific 
interactions that do not cover the entire molecular surface. Binding energy is not always 
a linear function of similarity to the transition state. Neural networks can also learn to 
recognize regions of inhibitors likely to be chemically modified by an enzyme. The neural 
25 network method is well suited for the task of simulating biological molecular recognition. 
Sucfrmethods can be used to search chemical libraries to augment the process of 
discovering pharmacological transition state inhibitors. 

It would be appreciated by those in the art that in addition to the electrostatic 
potential, other parameters descriptive of interactions between enzymes and substrates or 
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motors, and between receptors and ligands, could be utilized advantageously in the 
present method. These could include, for example, hydrophobic interactions, polarization 
effects, steric effects, and geometric effects. It would also be appreciated by those in the 
art that the present method can be encoded as information on a medium readable by a 
general-purpose computer to enable a computer to perform the necessary calculations. 

All patents and references mentioned hereinabove are hereby incorporated by 
referenceintheirentirety. While the foregoing invention has been described in some detail 
for purposes of clarity and understanding, it will be appreciated by one skilled in the art 
from a reading of the disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention in the appended claims. 



